Fairness and Welfare Quantification for Regret in Multi-Armed Bandits
نویسندگان
چکیده
We extend the notion of regret with a welfarist perspective. Focussing on classic multi-armed bandit (MAB) framework, current work quantifies performance algorithms by applying fundamental welfare function, namely Nash social (NSW) function. This corresponds to equating algorithm's geometric mean its expected rewards and leads us study regret, defined as difference between - priori unknown optimal (among arms) performance. Since NSW is known satisfy fairness axioms, our approach complements utilitarian considerations average (cumulative) wherein algorithm evaluated via arithmetic rewards. develops an that, given horizon play T, achieves O ( sqrt{(k log T)/T} ), here k denotes number arms in MAB instance. Since, for any algorithm, at least much (the AM-GM inequality), lower bound holds well. Therefore, guarantee essentially tight. In addition, we develop anytime O( T ).
منابع مشابه
Bounded regret in stochastic multi-armed bandits
We study the stochastic multi-armed bandit problem when one knows the value μ(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bound...
متن کاملSimple regret for infinitely many armed bandits
We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous algorithms for this setting were designed for minimizing the cumulative regret of the learner. In this paper, we propose an algorithm aiming at minimizing th...
متن کاملBounded Regret for Finite-Armed Structured Bandits
We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problemdependent lower bounds on the cumulative regret showing that at least in special cases t...
متن کاملContextual Multi-Armed Bandits
We study contextual multi-armed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric. Abstractly, a contextual multi-armed bandit problem models a situation where, in a sequence of independent trials, an online algorithm chooses, based on a given context (side information), an action from a set of possible actions ...
متن کاملStaged Multi-armed Bandits
In conventional multi-armed bandits (MAB) and other reinforcement learning methods, the learner sequentially chooses actions and obtains a reward (which can be possibly missing, delayed or erroneous) after each taken action. This reward is then used by the learner to improve its future decisions. However, in numerous applications, ranging from personalized patient treatment to personalized web-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i6.25829